Crawling rich internet applications: the state of the art

نویسندگان

Suryakant Choudhary

Mustafa Emre Dincturk

Seyed M. Mirtaheri

Ali Moosavi

Gregor von Bochmann

Guy-Vincent Jourdan

Iosif-Viorel Onut

چکیده

Web applications have come a long way, both in terms of adoption to provide information and services, and in terms of the technologies to develop them. With the emergence of richer and more advanced technologies such as AJAX, web applications have become more interactive, responsive and user friendly. These applications, often called Rich Internet Applications (RIAs), changed the web applications in two ways: (1) dynamic manipulation of clientside state and (2) asynchronous communication with the server. However, at the same time, such techniques also introduced new challenges. One important challenge is the difficulty of automatically crawling these new applications. Without crawling, RIAs cannot be indexed nor tested automatically. Traditional Copyright c © IBM Canada Ltd., 2012. Permission to copy is hereby granted provided the original copyright notice is reproduced in copies made. Disclaimer: The views expressed in this article are the sole responsibility of the authors and do not necessarily reflect those of IBM. Trademarks: IBM and AppScan are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at ”Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml. crawlers are not able to handle these newer technologies. This paper surveys the research on addressing the problem of crawling RIAs and provides some experimental results to compare existing crawling strategies. In addition, we provide some future directions for research on crawling RIAs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexing Rich Internet Applications Using Components-Based Crawling

Automatic crawling of Rich Internet Applications (RIAs) is a challenge because client-side code modifies the client dynamically, fetching server-side data asynchronously. Most existing solutions model RIAs as state machines with DOMs as states and JavaScript events execution as transitions. This approach fails when used with “real-life”, complex RIAs, because the size of the produced model is m...

متن کامل

On Location Aware Internet

An important aspect of performance for Internet-based applications is network delay (measured in terms of bandwidth and latency). The contribution and the basic motivation of Location Aware Internet is to enable clients to easily find the nearest (in terms of latency) out of a number of servers that can service a specific request. Location Aware Internet can significantly improve the performanc...

متن کامل

A density based clustering approach to distinguish between web robot and human requests to a web server

Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data ...

متن کامل

A Strategy for Efficient Crawling of Rich Internet Applications

This thesis studies the problem of crawling rich internet applications. These applications are built using advanced web technologies which allow them to be more dynamic and enable better user experiences. In recent years, the popularity and importance of web applications has continually increased and they are now very commonly used to complete essential tasks such as financial transactions. As ...

متن کامل

A Scalable P2P RIA Crawling System with Partial Knowledge

Rich Internet Applications are widely used as they are interactive and user friendly. Automated tools for crawling Rich Internet Applications have become needed for many reasons such as content indexing or testing for correctness and security. Due to the large size of RIAs, distributed crawling has been introduced to reduce the amount of time required for crawling. However, having one controlle...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Crawling rich internet applications: the state of the art

نویسندگان

چکیده

منابع مشابه

Indexing Rich Internet Applications Using Components-Based Crawling

On Location Aware Internet

A density based clustering approach to distinguish between web robot and human requests to a web server

A Strategy for Efficient Crawling of Rich Internet Applications

A Scalable P2P RIA Crawling System with Partial Knowledge

عنوان ژورنال:

اشتراک گذاری